Project-Team:WIMMICS

Inria | Raweb 2019 | Presentation of the Project-Team WIMMICS | WIMMICS Web Site


	PDF	e-Pub

Previous |

Home | Next next

Section: New Results

Analyzing and Reasoning on Heterogeneous Semantic Graphs

SPARQL Function

Participant : Olivier Corby.

We wrote a SHACL interpreter with the LDScript language. Within the SPARQL Function LDScript [56] language we intoduced new datatypes for JSON and XML DOM. We have written a technical documentation for the whole language: http://ns.inria.fr/sparql-extension.

Ontology alignement approach based on Embedded Space

Participants : Molka Dhouib, Catherine Faron Zucker, Andrea Tettamanzi.

In the framework of a collaborative project with Silex France company aiming to model the social network of service providers and companies, as a preliminary step, we developed last year a dedicated vocabulary of competences and fields of activities to semantically annotate B2B service offers. This year, we proposed a new ontology alignment approach based on a set of rules exploiting the embedded space and measuring clusters of labels to discover the relationship between concepts. We tested our system on the OAEI conference complex alignment benchmark track and then applied it to aligning ontologies in a real-world case study of Silex company. The experimental results show that the combination of word embedding and the radius measure make it possible to determine, with good accuracy, not only equivalence relations, but also hierarchical relations between concepts. This work has been presented at the 15th International Conference, SEMANTiCS 2019 [35].

Argument Mining and Argumentation Theory

Participants : Elena Cabrio, Shohreh Haddadan, Tobias Mayer, Milagro Teruel, Laura Alonso Alemany, Johanna Frau.

We have proposed an Argument Mining approach to political debates [23]. We have addressed this task in an empirical manner by annotating 39 political debates from the last 50 years of US presidential campaigns, creating a new corpus of 29k argument components, labeled as premises and claims. We then proposed two tasks: (1) identifying the argumentative components in such debates, and (2) classifying them as premises and claims. We showed that feature-rich SVM learners and Neural Network architectures outperform standard baselines in Argument Mining over such complex data. We released the new corpus USElecDeb60To16 and the accompanying software under free licenses to the research community. As a result of these findings, we have also realized the DISPUTool system [22]. The results of this research have been published at ACL 2019 and IJCAI 2019.

We have contributed to the definition of the ACTA tool, aiming at applying argument mining to clinical text, given the importance of argument-based decision making in medicine [26]. ACTA is a tool for automating the argumentative analysis of clinical trials. The tool is designed to support doctors and clinicians in identifying the document(s) of interest about a certain disease, and in analyzing the main argumentative content and PICO elements. The results of this research have been published at IJCAI 2019.

Finally, together with Laura Alonso Alemany (Univ. Cordoba), Johanna Frau (Univ. Cordoba) and Milagro Teruel (Univ. Cordoba), we evaluated different attention mechanisms applied over a state-of-the-art architecture for sequence labeling [18]. Argument mining is a rising area of Natural Language Processing (NLP) concerned with the automatic recognition and interpretation of argument components and their relations. Neural models are by now mature technologies to be exploited for automating the argument mining tasks, despite the issue of data sparseness. This could ease much of the manual effort involved in these tasks, taking into account heterogeneous types of texts and topics. They assessed the impact of different flavors of attention in the task of argument component detection over two datasets: essays and legal domain. They showed that attention not models the problem better but also supports interpretability. The results of this research have been published at FLAIRS 2019.

Mining and Reasoning on Legal Documents

Participants : Serena Villata, Cristian Cardellino, Milagro Teruel, Laura Alonso Alemany, Guido Governatori, Leendert Van Der Torre, Beishui Liao, Nir Oren.

Together with Cristian Cardellino (Univ. Cordoba), Santiago Marro (Univ. Cordoba), Milagro Teruel (Univ. Cordoba) and Laura Alonso Alemany (Univ. Cordoba), we have adapted the semi-supervised deep learning architecture known as Convolutional Ladder Networks, from the domain of computer vision, and explored how well it works for a semi-supervised Named Entity Recognition and Classification task with legal data. The idea of exploring a semi-supervised technique is to assess the impact of large amounts of unsupervised data (cheap to obtain) in specific tasks that have little annotated data, in order to develop robust models that are less prone to overfitting. In order to achieve this, first we checked the impact on a task that is easier to measure. We presented some preliminary results, however, the experiments carried out showed some interesting insights that foster further research in the topic. The results of this research have been published at FLAIRS 2019 [9].

Together with some colleagues from Data61 Queensland (Australia) and Antonino Rotolo (University of Bologna), Serena Villata proposed a framework for modelling legislative deliberation in the form of dialogues. Roughly, in legislative dialogues coalitions can dynamically change and propose rule-based theories associated with different utility functions, depending on the legislative theory the coalitions are trying to determine. The results of this research have been published at ICAIL 2019 [21].

Finally, together with Nir Oren (Univ. Aberdeen), Leendert van der Torre (Univ. Luxembourg) and Beishui Liao (Univ. Zhejiang), we defined, using hierarchical abstract normative systems (HANS), three kinds of prioritized normative reasoning approaches called Greedy, Reduction and Optimization. Then, after formulating an argumentation theory for a HANS, we showed that for a totally ordered HANS, Greedy and Reduction can be represented in argumentation by applying the weakest link and the last link principles, respectively, and Optimization can be represented by introducing additional defeats capturing the idea that for each argument that contains a norm not belonging to the maximal obeyable set then this argument should be rejected. The results of this research have been published on the Journal of Logic and Computation [3].

Natural Language Processing of Song Lyrics

Participants : Michael Fell, Elena Cabrio, Fabien Gandon, Alain Giboin.

We progressed our work in the WASABI ANR project in two directions. First, we tackled the problem of summarizing song lyrics. Given the peculiar structure of songs, applying generic text summarization methods to lyrics can lead to the generation of highly redundant and incoherent text. We thus proposed to enhance state-of-the-art text summarization approaches with a method inspired by audio thumbnailing. We showed how these summaries that take into account the audio nature of the lyrics outperform the generic methods according to both an automatic evaluation and human judgments. The work resulted in an RANLP publication [17]. Second, we investigated the task of detecting swear words and other potential harmful content in lyrics. The Parental Advisory Label (PAL) is a warning label that is placed on audio recordings in recognition of profanity or inappropriate references, with the intention of alerting parents of material potentially unsuitable for children.

Since 2015, digital providers such as iTunes, Spotify, Amazon Music and Deezer also follow PAL guidelines and tag such tracks as explicit.

Nowadays, such labelling is carried out mainly manually on voluntary basis, with the drawbacks of being time consuming and therefore costly, error prone and partly a subjective task. Therefore, we compared automated methods ranging from dictionary-based lookup to state-of-the-art deep neural networks to automatically detect explicit contents in English lyrics. We showed that more complex models perform only slightly better on this task, and relying on a qualitative analysis of the data, we discussed the inherent hardness and subjectivity of the task. The work was published at the RANLP conference [16]. We are currently modelling emotion in song lyrics, with the focus on the hierarchical and sequential structure of these texts, in which lines make up segments which make up the full lyric. And later parts may be perceived differently in light of the emotion previous parts have caused.

RDF Mining

Participants : Thu Huong Nguyen, Andrea Tettamanzi.

In collaboration with our former PhD student Tran Duc Minh, Claudia d'Amato of the University of Bari, and Nguyen Thanh Binh of the Danang University, we made a comparison of rule evaluation metrics for EDMAR, our evolutionary approach to discover multi-relational rules from ontological knowledge bases exploiting the services of an OWL reasoner [36].

In the framework of Nguyen Thu Huong's thesis, we have proposed a grammar-based evolutionary method to mine RDF datasets for OWL class disjointness axioms [31], [30].

Machine Learning for Operations Research

Participant : Andrea Tettamanzi.

Together with Alberto Ceselli and Saverio Basso of the University of Milan we used machine learning techniques to understand good decompositions of linear programming problems [1].

Image recognition with Semantic Data

Participants : Anna Bobasheva, Fabien Gandon, François Raygagne, Frédéric Precioso.

The objective of the MonaLIA 2.0 project is to exploit the crossover between the Deep Learning methods of image analysis and knowledge-based representation and reasoning and its application to the semantic indexing of annotated works and images in JocondeLab dataset. The goal is to identify automated or semi-automated tasks to improve the annotation and information retrieval. This project was an 11-month contract with Ministry of Culture plus 6-month internship.

Training dataset preparation

Developed SPARQL query to extract the subsets of images to train the multi-label Deep Learning classifiers for a given set of categories

Developed Python scripts to filter and balance training images and Joconde specific data loader

Identified categories that are not linked by Garnier Thesaurus but visually related and extended the Joconde metadata with the new RDF triples (e.g. category ”Rider” is linked to categories ”Horse” and ”Human being”)

Researched effects of various image transformations on the object detection performance (resizing, cropping, padding, scaling)

For the underrepresented categories (bicycle, airplane, cat, etc. ) downloaded the images from the external sources such as Kaggles’ “Painter by Number”, the Behance Artistic Media Set, and Cleveland Museum of Art. This has been done with the internship of François Raygagne.
Building Deep Learning model

Adapted the pre-trained VGG16 and Inception v3 PyTorch implementations for multi-label classification of the artwork images

Tuned models hyperparameters

Experimented with scaling the multi-labeled for 10, 20, 40 classes

Experimented with binary classifiers for a single category
Classification results consumption

Studied the possible dependencies between knowledge graph metrics and classification performance (average precision of object detection)

Extended the Joconde metadata with prediction scores produced by the classifiers

Included the scores into category search queries to filter and order the results to produce more relevant results

Results were presented at atelier Culture - Inria, on december 2nd, Institut national d’histoire de l’art in Paris.

Hospitalization Prediction

Participants : Raphaël Gazzotti, Catherine Faron Zucker, Fabien Gandon.

HealthPredict is a project conducted in collaboration with the Département d’Enseignement de Recherche en Médecine Générale (DERMG) at Université Côte d’Azur and the SynchroNext company. It aims at providing a digital health solution for the early management of patients through consultation with their general practitioner and health care circuit. Concretely, it is a predictive Artificial Intelligence interface that allows us to cross the data of symptoms, diagnosis and medical treatments of the population in real time to predict the hospitalization of a patient. We propose and evaluate different ways to enrich the features extracted from electronic medical records with ontological resources before turning them into vectors used by Machine Learning algorithms to predict hospitalization. We reported and discussed the results of our first experiments on the database PRIMEGE PACA at EGC 2019 [38] and ESWC 2019 [19]. We propose a semi-supervised approach based on DBpedia to extract medical subjects from EMRs and evaluate the impact of augmenting the features used to represent EMRs with these subjects in the task of predicting hospitalization. Our results will be presented at SAC 2020 [61]. We designed an interface to assist in the decision-making process of general practitioners that allows them to identify in patients the first signs that lead to hospitalization and medical problems to be treated as a priority. It has been presented at [55].

Learning Analytics and Adaptive learning

Participants : Oscar Rodríguez Rocha, Catherine Faron Zucker.

We developed semantic queries to analyse the student activity data available in the Educlever knowledge graph and the SIDES knowledge graph, showing the added value of Semantic Web modelling enabling ontology-based reasoning. The results of our analysis of the SIDES knowledge graph have been presented at the 2019 French workshop on AI and Health [39].

The faculties of medicine, all grouped together under the auspices of the Conférence des doyens, are collectively proposing to upgrade the SIDES solution to an innovative solution called Intelligent Health Education System 3.0 (SIDES 3.0). As part of this community-based approach, the coordination of the project will be carried out by the Université Numérique Thématique (UNT) en Santé et Sport, the GIP UNESS.fr. This structure offers an ideal national positioning for support and coordination of training centers (UFR) and also offers long-term financial sustainability.

In particular, Inria through the Wimmics research team focuses on the recommendation of existing questions to the students according to their profile. For this, research activities are performed to classify the questions present in the platform by difficulty levels according to the Bloom's revised taxonomy, considering the information contained in text of the question. Also, research activities have focused to predict the probability of the outcomes of the students to questions considering previous answers stored in the SIDES graph.

With the ultimate goal of recommending resources adapted to the student's profile and context, we developed an approach to predict the success of students when answering training or test questions by learning a student model from the SIDES knowledge graph. To learn a user model from the SIDES knowledge graph, we combine state-of-the-art features with node embeddings. Our first results will be presented at SAC 2020.

The level of complexity and specificity of the learning objective associated with a question may be a key criterion to integrate in the recommendation process. For this purpose, we developed an approach to classify the questions of the SIDES platform according to the reference Bloom's taxonomy, by extracting the level of complexity and specificity of their learning objectives from their textual descriptions with semantic rules.

Previous |

Home | Next next